All Questions
25 questions
0votes
2answers
147views
I am getting all scores as 100% on my machine learning models. Is it okay to have this kind of result?
I am getting all scores for my ML model as 100% for the Extra Trees Algorithm. I am applying the necessary pre-processing steps (duplication removal, correlations validating, cardinality validation, ...
0votes
1answer
125views
KMeans is not predicting the correct cluster
k-means clustering is done and created 5 optimal number of clusters. (Clustering is done unevenly). While using them in my model, the model is not choosing the exact cluster which has the exact data. ...
1vote
0answers
1kviews
Downsampling in sklearn. Test and Train performance question
I have a class imbalanced data set, and have the following set up to handle class imbalance. I first split to test and train and only perform downsampling on the training set and then get the test ...
1vote
1answer
104views
why do some people don't use scikit-learn library instead of writing long python functions?
The other day I was exploring Kaggle and I found most of them don't use scikit-learn or any other library but instead they write these long python scripts. For ...
1vote
1answer
153views
How to Predict future temperatures based on past data with years
Hello everyone I am new to Machine learning and predicting. I just want to know if I can predict future temperatures based on past year's data and how I can do it Thanks. here is the pic of the ...
0votes
1answer
62views
Why does my model's output look correct pattern-wise but with a consistent offset?
I'm having an issue where my AutoML model's output is having a consistent offset from the test dataset (image below). I'm wondering if anybody has any input on what could be causing this? My initial ...
1vote
0answers
29views
What is the formula of gradient boosting trees model?
I have been reading about gradient boosting trees (GBT) in some machine learning books and papers, but the references seem to only describe the training algorithms of GBT, but they do not describe the ...
0votes
2answers
2kviews
Gridsearch ValueError: Input contains infinity or a value too large for dtype('float64'). - Using Pipeline
Update: I have non NAN values so fillna is not an issue. Clean dataset. I'm having this error occur when I try to predict using my grid best params. I get a score when fit it onto the training data. I ...
0votes
1answer
1kviews
ValueError: Found input variables with inconsistent numbers of samples: [6, 366]
I'm trying to split my x and y into train and test data for my ML model but it's giving me this error: ...
1vote
1answer
3kviews
What is The difference of xgboost.sklearn.XGBClassifier and xgboost.XGBClassifier?
xgboost.sklearn VS xgboost.XGBClassifier Here is my code that I tried to train make_moons ...
1vote
1answer
73views
Classification Based Collaborative Filtering Model
I was going through algorithms for collaborative filtering-based prediction. Most of the places, I read about using matrix factorization based on ratings of the likeness of the user. But for my use ...
-1votes
1answer
890views
Predicting the test data with LinearRegression model gives ValueError: shapes (8523,1606) and (1605,) not aligned: 1606 (dim 1) != 1605 (dim 0)
Fitting the model, testing and getting the score or r2 does not give the error. But when I try to predict the actual data I get this ValueError: ...
5votes
2answers
804views
Sklearn: applying cost complexity pruning along with pipeline
I have a data set with categorical variables. I have defined a decision tree algorithm and transformed these columns to numerical equivalent using one hot encoding functionality in sklearn: Create ...
1vote
1answer
329views
Random Parameters to fix in ML to perform controlled experiments
Many algorithms and methods in modern Machine Learning techniques contain randomness, and because of that, running the same ML script several times can result in different outputs, therefore accuracy ...
1vote
1answer
126views
How to use a a trained model
I just trained my first model in Python 3.7/scikitlearn (Linear Regression) (well I copied most of the code but its something ^^). Now I want to actually Use the model. Specifically its about sons ...